10.2 Neural networks
One of the difficulties of processing visual languages automatically is input, when it is presented in the form of images. Images are represented digitally as collections of pixels, arrayed in memory in a way that makes sense for display and storage, but which is completely disconnected to the meaning these images have to humans. Additionally, if input is hand written, graphemes can present variations which don’t affect human understanding but which mean completely different pixel patterns are present. And positioning of objects is again not based on hard rules, but rather on visual interpretation.
For these reasons, machine learning techniques developed in the field of computer vision are necessary to adequately process logograms and graphemes. While the researcher can use any toolkit and algorithm they prefer, Quevedo includes a module to facilitate using neural networks with Quevedo datasets.
10.2.1 Darknet
Darknet1 is “an open source neural network framework written in C and CUDA”, developed by the inventor of the YOLO algorithm, Joseph Redmon. This framework includes a binary and linked library which make configuring, training, and using neural networks for computer vision straightforward and efficient.
The neural network module included with Quevedo needs darknet to be available. This module automatically prepares network configuration and training files from the metadata in the dataset, and can manage the training and prediction process.
10.2.1.1 Installation
We recommend using this fork by Alexey Bochkovskiy: https://github.com/AlexeyAB/darknet. Installation can vary depending on your environment, including the CUDA and OpenCV (optional) libraries installed, but with luck, the following will work:
In the Makefile, you probably want to enable GPU=1
and
CUDNN=1
, otherwise training will be too slow. Depending
on the GPU available and CUDA installation, you might need to change
the ARCH
and NVCC
variables. For Quevedo to
use Darknet, it is also necessary to set LIBSO=1
so the
linked library is built. Finally, if you want to use Darknet’s data
augmentation, you probably want to set OPENCV=1
to make
it faster.
After darknet is compiled, a binary (named darknet
)
and library (libdarknet.so
in linux) will be built.
Quevedo needs to know where these files are, so in the [darknet]
section of the configuration, the path to the binary and library must
be set. By default, these point to a darknet directory in the current
directory. Some additional arguments to the darknet binary for
training can be set in the options
key.
10.2.2 Network configuration
Neural networks are ideal to deal with image data, due to their ability to find patterns and their combinations. Quevedo can help with preparing the configuration and training files to train darknet neural networks, can launch the actual training, and can compute evaluation metrics on the resulting network weights. It can also be used as a library to peruse the trained network in an application, not only for research.
But no net is a silver bullet for every kind problem, and Quevedo datasets deal with different types of data with complex annotations. Therefore, Quevedo allows different network configurations to be kept in the configuration file, aiding both ensemble applications and exploration of the problem space.
To add a neural network configuration to Quevedo, add a section to
the config.toml
file with the heading [network.<network_name>]
.
The initial configuration file that Quevedo creates for every dataset
contains some examples that can be commented out and modified.
Under this heading, different options can be set, like a
subject
key that gives a brief description of the purpose
of the network. The most important configuration option is
task
, which can take the values classify
or
detect
.
In Quevedo v1.2, a key “extend” has been added that can be used to
share network configuration. If a network net_a
has a key
extend = "net_b"
,
parameters from net_b
will be used when no other value
has been set in net_a
. This can be useful to share common
options when testing different networks, or to set a single source of
truth for options that must be common. Since v1.3, “extend” is
recursive, so a chain of configuration inheritance can be used.
10.2.2.1 Classifier
Classifier networks can be used with individual graphemes, and therefore use the data in the grapheme subsets of the dataset. Classify networks see the image as a whole, and try to find the best matching “class” from the classes they have been trained in. In Quevedo, classify networks are built with the AlexNet (Krizhevsky, Sutskever, y Hinton 2017) architecture, a CNN well suited to the task.
10.2.2.2 Detector
Detector networks try to find objects in an image, and therefore are well suited for finding the different graphemes that make up a logogram. Apart from detecting the boundary boxes of the different objects, they can also do classification of the objects themselves. Depending on the nature and complexity of the data, classification of graphemes can be performed by the same network that detects them within a logogram, or can be better split into a different (or many) classifier networks. The detector network architecture used by Quevedo is YOLOv3 (Redmon y Farhadi 2018).
10.2.2.3 Tag selection
Since Quevedo datasets support a multi-tag annotation schema, a
single “class”/“label” has to be selected for the networks in order to
perform classification (including detector networks, since they have a
classification step). By default the first tag of the tag schema will
be used, but other tags can be selected by writing tag = "some_tag_in_the_schema"
.
A combination of the tags can be used by listing them, for example
tag = [ "some_tag", "some_other_tag" ]
.
This will produce a single label for each grapheme by combining the
values of the tags with an underscore in between, and train and
evaluate the network with that single label.
10.2.2.4 Annotation selection
To specify what subsets of data to use for training and testing of
a neural network, we can list the names in the subsets
option. Additionally, we might want to select some logograms or
graphemes to use for a particular network based on the tag values. We
can do this by leaving the relevant tags for that network empty, in
which case Quevedo will skip the annotation.
In classify networks, finer control can also be achieved using a
“filter” section for the network configuration. This filter accepts a
key criterion
which determines what tag from the
annotation schema to use to select annotations. Then, an
include
or exclude
key can be set to the
list of values to filter. When include
is used, if a
grapheme is tagged with any of the values in the list, it is included
for training and test, otherwise it is ignored. With
exclude
, the reverse happens.
10.2.2.5 Data augmentation
Recent versions of darknet include automatic data augmentation that happens “on the fly”, while the network is being trained. This data augmentation is not based on semantics of the images, but on image properties like contrast or rotation. By slightly and randomly modifying the images that the network is trained on, overfitting can be avoided and better generalization achieved. Some relevant options for grapheme and logogram recognition are supported by Quevedo, and if set in the network configuration will be written into the Darknet configuration file.
The header to use is [network.<network_name>.augment]
,
and the options supported are angle
(randomly rotate
images up to this amount of degrees), exposure
(change
brightness of the image), flip
(if set to 1
,
images are sometimes flipped), and, only for classify networks
aspect
, which modifies the grapheme width/height
relation.
In visual writing systems, not all of this transformations are without meaning, so by default they are disabled so that the user can choose which options make sense for their particular use case and data.
10.2.3 Usage
10.2.3.1 At the command line
Once the network has been configured, the files necessary for
training it can be created by running prepare
. This will create a directory
in the dataset, under networks
, with the name of the
neural network. By default, Quevedo will use the neural network marked
with default = True
, so to change to a different one use
the option -N <network>
(since this is an option
common too many commands, it must be used after the
quevedo
binary name but before the command).
Once the directory with all the files needed for training has been
created, a simple invocation of train
will launch the darknet
executable to train the neural network. This command can be
interrupted, and if enough time has passed that some partial training
weights have been found, it can be later resumed by calling
train
again (to train from zero, use
--no-resume
).
The weights obtained by the training process will be stored in the
network directory with the name darknet_final.weights
.
This is a darknet file that can be used independently of Quevedo.
To evaluate the results, the test
command can be used, which will
get the predictions from the net for the annotations marked as “test”
(see split
) and output some metrics, and
optionally the full predictions as a csv file so that
fine metrics or visualizations can be computed with something else.
The predict
command can be used to
directly get the predictions from the neural network for some image,
not necessarily one in the dataset.
Since commands can be chained, a full pipeline of training and testing the net can be written as:
10.2.3.2 At the web interface
Trained neural networks can also be used on the web interface (Section 10.7.3.0.1). Networks for detection will be available for logograms, and classifier ones will be available for graphemes. They will be listed at the top right of the interface. When running them, the current annotation image will be fed to the neural network, and the predictions applied (but not saved until the user presses the save button). This can be used to visualize the neural network results, or to bootstrap manual annotation of logograms and graphemes.